A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams
نویسندگان
چکیده
Cities have been a thriving place for citizens over the centuries due to their complex infrastructure. The emergence of the Cyber-Physical-Social Systems (CPSS) and context-aware technologies boost a growing interest in analysing, extracting and eventually understanding city events which subsequently can be utilised to leverage the citizen observations of their cities. In this paper, we investigate the feasibility of using Twitter textual streams for extracting city events. We propose a hierarchical multi-view deep learning approach to contextualise citizen observations of various city systems and services such as traffic, public transport, weather, sociocultural activities and public safety as a source of city events. Our goal has been to build a flexible architecture that can learn representations useful for tasks, thus avoiding excessive task-specific feature engineering. We apply our approach on a real-world dataset consisting of event reports and tweets collected by [3] over four months from San Francisco Bay Area dataset and additional datasets collected from Greater London. The results of our evaluations show that our proposed solution outperforms the existing models and can be used for extracting city related events with an averaged accuracy of 81% over all classes. To further evaluate the impact of our Twitter event extraction model, we have used two sources of authorised reports through collecting road traffic disruptions data from Transport for London API, and parsing the Time Out London website for sociocultural events. The analysis showed that 49.5% of the Twitter traffic comments are reported approximately five hours prior to the authorities official records. Moreover, we discovered that amongst the scheduled sociocultural event topics; tweets reporting transportation, cultural and social events are 31.75% more likely to influence the distribution of the Twitter comments than sport, weather and crime topics.
منابع مشابه
What Is New in Our City? A Framework for Event Extraction Using Social Media Posts
Post streams from public social media platforms such as Instagram and Twitter have become precious but noisy data sources to discover what is happening around us. In this paper, we focus on the problem of detecting and presenting local events in real time using social media content. We propose a novel framework for real-time city event detection and extraction. The proposed framework first appl...
متن کاملEvent Discovery in Social Media Feeds
We present a novel method for record extraction from social streams such as Twitter. Unlike typical extraction setups, these environments are characterized by short, one sentence messages with heavily colloquial speech. To further complicate matters, individual messages may not express the full relation to be uncovered, as is often assumed in extraction tasks. We develop a graphical model that ...
متن کاملReal World City Event Extraction from Twitter Data Streams
The immediacy of social media messages means that it can act as a rich and timely source of real world event information. The detected events can provide a context to observations made by other city information sources such as fixed sensor installations and contribute to building ‘city intelligence’. In this work, we propose a novel unsupervised method to extract real world events that may impa...
متن کاملA Simple Bayesian Modelling Approach to Event Extraction from Twitter
With the proliferation of social media sites, social streams have proven to contain the most up-to-date information on current events. Therefore, it is crucial to extract events from the social streams such as tweets. However, it is not straightforward to adapt the existing event extraction systems since texts in social media are fragmented and noisy. In this paper we propose a simple and yet e...
متن کاملAutomatic targeted-domain spatiotemporal event detection in twitter
Twitter has become an important data source for detecting events, especially tracking detailed information for events of a specific domain. Previous studies on targeteddomain Twitter information extraction have used supervised learning techniques to identify domain-related tweets, however, the need for extensive manual labeling makes these supervised systems extremely expensive to build and mai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.09975 شماره
صفحات -
تاریخ انتشار 2017